Search CORE

45 research outputs found

Adaptive, fast walking in a biped robot under neuronal control and learning

Author: Geng T.
Geng T.
Kulvicius T.
Kulvicius T.
Manoonpong P.
Manoonpong P.
Porr B.
Porr B.
Wörgötter F.
Wörgötter F.
Publication venue
Publication date: 01/01/2007
Field of study

Human walking is a dynamic, partly self-stabilizing process relying on the interaction of the biomechanical design with its neuronal control. The coordination of this process is a very difficult problem, and it has been suggested that it involves a hierarchy of levels, where the lower ones, e.g., interactions between muscles and the spinal cord, are largely autonomous, and where higher level control (e.g., cortical) arises only pointwise, as needed. This requires an architecture of several nested, sensori–motor loops where the walking process provides feedback signals to the walker's sensory systems, which can be used to coordinate its movements. To complicate the situation, at a maximal walking speed of more than four leg-lengths per second, the cycle period available to coordinate all these loops is rather short. In this study we present a planar biped robot, which uses the design principle of nested loops to combine the self-stabilizing properties of its biomechanical design with several levels of neuronal control. Specifically, we show how to adapt control by including online learning mechanisms based on simulated synaptic plasticity. This robot can walk with a high speed (>3.0 leg length/s), self-adapting to minor disturbances, and reacting in a robust way to abruptly induced gait changes. At the same time, it can learn walking on different terrains, requiring only few learning experiences. This study shows that the tight coupling of physical with neuronal control, guided by sensory feedback from the walking pattern itself, combined with synaptic learning may be a way forward to better understand and solve coordination problems in other complex motor tasks

Middlesex University Research Repository

Reinforcement learning or active inference?

This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Minimalistic control of biped walking in rough terrain

Author: A. D. Kuo
A. D. Kuo
A. Goswami
A. Goswami
A. J. Ijspeert
D. G. E. Hobbelen
F. Asano
F. Iida
Fumiya Iida
G. Taga
J. Hass
J. L.-S. Su
J. Pratt
K. Byl
K. Ono
M. Garcia
M. J. Kurz
M. Kwan
P. G. Adamczyk
P. Manoonpong
R. Q. Linde van der
Russ Tedrake
S. Aoi
S. Aoi
S. Aoi
S. H. Collins
S. H. Collins
S. Kajita
T. Kinugasa
T. McGeer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2009
Field of study

Toward our comprehensive understanding of legged locomotion in animals and machines, the compass gait model has been intensively studied for a systematic investigation of complex biped locomotion dynamics. While most of the previous studies focused only on the locomotion on flat surfaces, in this article, we tackle with the problem of bipedal locomotion in rough terrains by using a minimalistic control architecture for the compass gait walking model. This controller utilizes an open-loop sinusoidal oscillation of hip motor, which induces basic walking stability without sensory feedback. A set of simulation analyses show that the underlying mechanism lies in the “phase locking” mechanism that compensates phase delays between mechanical dynamics and the open-loop motor oscillation resulting in a relatively large basin of attraction in dynamic bipedal walking. By exploiting this mechanism, we also explain how the basin of attraction can be controlled by manipulating the parameters of oscillator not only on a flat terrain but also in various inclined slopes. Based on the simulation analysis, the proposed controller is implemented in a real-world robotic platform to confirm the plausibility of the approach. In addition, by using these basic principles of self-stability and gait variability, we demonstrate how the proposed controller can be extended with a simple sensory feedback such that the robot is able to control gait patterns autonomously for traversing a rough terrain.National Science Foundation (U.S.) (grant 0746194)Swiss National Science Foundation (grant PBZH2-114461)Swiss National Science Foundation (grant PP00P2_123387/1

DSpace@MIT

Crossref

Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

Author: A Arleo
A Barto
A Saudargiene
AH Klopf
AH Klopf
B Porr
B Porr
B Porr
Bernd Porr
Christoph Kolodziejski
CJCH Watkins
CL Hull
CL Hull
DJ Foster
F Wörgötter
Florentin Wörgötter
GQ Bi
H Markram
IH Witten
JC Magee
JD Miller
JL Krichmar
JP Pfister
LP Kaelbling
M Tsukamoto
P Dayan
P Dayan
P Dayan
P Manoonpong
P Roberts
PR Montague
PR Montague
R Sutton
RE Suri
RE Suri
RE Suri
RE Suri
RE Suri
RS Sutton
RS Sutton
RS Sutton
RV Florian
SP Singh
T Kulvicius
T Strösslin
TB Boykina
W Gerstner
W Schultz
W Schultz
Y Humeau
Publication venue: Springer-Verlag
Publication date: 01/01/2008
Field of study

A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

Do Humans Optimally Exploit Redundancy to Control Step Variability in Walking?

Author: A d'Avella
AA Faisal
AD Kuo
AD Kuo
AE Minetti
AJ Ijspeert
AJ Nagengast
AL Goldberger
BAJ Reddi
C Pinto
C-K Peng
C-K Peng
CM Harris
D Liu
D Maraun
DA Braun
DA Winter
DB Lockhart
DF Hoyt
DH Gates
DH Gates
DH Gates
E Todorov
E Todorov
EC Tumer
EP Zehr
FC Anderson
FJ Valero-Cuevas
G Schöner
G Taga
H Geyer
HG Kang
HG Kang
I O'Sullivan
J Duysens
J Theiler
JB Dingwell
JB Dingwell
JB Dingwell
JB Dingwell
JE Cotes
JEA Bertram
JJ Collins
JM Hausdorff
JM Hausdorff
JM Hausdorff
Joby John
Jonathan B. Dingwell
Joseph P. Cusumano
JP Cusumano
JP Cusumano
Jörn Diedrichsen
K Ohgane
KP Körding
LC Osborne
LR Bent
M Berniker
M Costa
M Golubitsky
M Jazayeri
M MacKay-Lyons
M Srinivasan
M Srinivasan
MD McDonnell
ML Latash
MY Zarrugh
N Bernstein
P Cordo
P Manoonpong
P Terrier
PT Piiroinen
R Margaria
RA Scheidt
RHS Carpenter
RJ van Beers
RM Alexander
S Rossignol
SE Engelbrecht
SH Collins
SH Scott
ST Grafton
T McGeer
T Schreiber
TM Owings
WH Warren
Y Hurmuzlu
Y Osaki
YP Ivanenko
Publication venue: Public Library of Science
Publication date: 01/07/2010
Field of study

It is widely accepted that humans and animals minimize energetic cost while walking. While such principles predict average behavior, they do not explain the variability observed in walking. For robust performance, walking movements must adapt at each step, not just on average. Here, we propose an analytical framework that reconciles issues of optimality, redundancy, and stochasticity. For human treadmill walking, we defined a goal function to formulate a precise mathematical definition of one possible control strategy: maintain constant speed at each stride. We recorded stride times and stride lengths from healthy subjects walking at five speeds. The specified goal function yielded a decomposition of stride-to-stride variations into new gait variables explicitly related to achieving the hypothesized strategy. Subjects exhibited greatly decreased variability for goal-relevant gait fluctuations directly related to achieving this strategy, but far greater variability for goal-irrelevant fluctuations. More importantly, humans immediately corrected goal-relevant deviations at each successive stride, while allowing goal-irrelevant deviations to persist across multiple strides. To demonstrate that this was not the only strategy people could have used to successfully accomplish the task, we created three surrogate data sets. Each tested a specific alternative hypothesis that subjects used a different strategy that made no reference to the hypothesized goal function. Humans did not adopt any of these viable alternative strategies. Finally, we developed a sequence of stochastic control models of stride-to-stride variability for walking, based on the Minimum Intervention Principle. We demonstrate that healthy humans are not precisely “optimal,” but instead consistently slightly over-correct small deviations in walking speed at each stride. Our results reveal a new governing principle for regulating stride-to-stride fluctuations in human walking that acts independently of, but in parallel with, minimizing energetic cost. Thus, humans exploit task redundancies to achieve robust control while minimizing effort and allowing potentially beneficial motor variability

Crossref

Directory of Open Access Journals

PubMed Central

Texas ScholarWorks